Model Selection

Multimodal Medical Analysis

# Multimodal Medical Analysis

MedGemma is a medical-specific multimodal AI model developed by Google, based on the Gemma 3 architecture, focusing on medical text and image understanding.

Google.medgemma 4b It GGUF

MedGemma-4B-IT is a medical-focused image-to-text generation model developed by Google.

MedGemma is a series of medical multimodal models optimized based on Gemma 3, specifically designed for medical text and image understanding, available in 4B and 27B parameter versions.

Dermatech Qwen2 VL 2B I1 GGUF

This is a multimodal model based on the Qwen2 architecture, focusing on text generation, image-to-text, and visual question answering tasks.

Image-to-Text English

Llama 3.2 11B Vision Radiology Mini

A radiology image-assisted interpretation model fine-tuned based on unsloth/Llama-3.2-11B-Vision-Instruct, with optimized runtime speed doubled

Transformers English

A multimodal large language model (MLLM) specifically designed for interpreting electrocardiogram (ECG) images, capable of handling various ECG-related tasks from diverse data sources.

Image-to-Text English

Llava Med V1.5 Mistral 7b

LLaVA-Med is a large language-vision biomedical assistant trained through curriculum learning, specifically designed for biomedical visual question answering tasks.

Chinese LLaVA Med 7B

A Chinese medical multimodal large language model based on the LLaVA-1.5 architecture, focusing on visual question answering tasks in the medical field.

Transformers Chinese

Chexpert Mimic Cxr Impression Baseline

This is a text generation model based on chest X-ray images, capable of generating radiology impression reports from medical imaging.

Transformers English

Llava Roco 8bit

BabyDoctor is a multimodal large language model that combines the capabilities of CLiP and LLaMA 2. It can understand and generate text while also comprehending images. The model has been fine-tuned specifically for interpreting radiology images such as X-rays, ultrasounds, MRIs, and CT scans.

Transformers English

RCLIP is a vision-language model fine-tuned from CLIP specifically optimized for medical image analysis in the radiology domain.

Transformers English

Quiltnet B 16 PMB

A multimodal foundation model based on ViT-B/16 visual encoder and PubMedBERT text encoder trained on the Quilt-1M pathology video dataset

A CLIP ViT-B/32 vision-language foundation model trained on the Quilt-1M pathology video dataset, specifically designed for histological analysis

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase